BamTools: a C++ API and toolkit for analyzing and managing BAM files

نویسندگان

  • Derek W. Barnett
  • Erik K. Garrison
  • Aaron R. Quinlan
  • Michael Strömberg
  • Gabor T. Marth
چکیده

MOTIVATION Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. RESULTS We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. AVAILABILITY BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PREPRINT The Scramble Conversion Tool

Motivation: The reference CRAM file format implementation is in Java. We present “Scramble”: a new C implementation of SAM, BAM and CRAM file I/O. Results: The C API for CRAM is 1.5–1.7x slower than BAM at decoding, but 1.8–2.6x faster at encoding. We see file size savings of 40–50%. Availability: Source code is available from http://sourceforge.net/ projects/staden/files/io lib/ Contact: jkb@s...

متن کامل

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can ...

متن کامل

h5vc: scalable nucleotide tallies with HDF5

SUMMARY As applications of genome sequencing, including exomes and whole genomes, are expanding, there is a need for analysis tools that are scalable to large sets of samples and/or ultra-deep coverage. Many current tool chains are based on the widely used file formats BAM and VCF or VCF-derivatives. However, for some desirable analyses, data management with these formats creates substantial im...

متن کامل

Berkeley Continuous Media Toolkit Api

The Berkeley Continuous Media Toolkit provides low-level, modular tools for developing distributed continuous media (CM) applications. The programming interface to the toolkit requires application developers to create and manage objects required to play back audio and video. These objects are distributed to di erent processes possibly on di erent hosts. This paper presents an application progra...

متن کامل

chopBAI: BAM index reduction solves I/O bottlenecks in the joint analysis of large sequencing cohorts

UNLABELLED Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 27 12  شماره 

صفحات  -

تاریخ انتشار 2011